Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Jun 2000 10:37:52 -0700
From:      David Greenman <dg@root.com>
To:        les@safety.net
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: Conflict between Intel 82558/9 and VIA MVP4? 
Message-ID:  <200006141737.KAA06177@implode.root.com>
In-Reply-To: Your message of "Wed, 14 Jun 2000 09:03:24 PDT." <200006141603.JAA80849@ns3.safety.net> 

next in thread | previous in thread | raw e-mail | index | archive | help
>We're having problems with the Intel EtherExpress 10/100 NICs in our
>product platform.  We suspect unfavorable interaction between the 82558
>and 82559 Intel parts and our motherboard chipset.  Here are some
>specifics:
>
>We're using 3.4-STABLE, with the "latest" fxp driver code:
>
> $FreeBSD: src/sys/pci/if_fxp.c,v 1.59.2.7 2000/04/01 19:04:21 dg Exp $
> $FreeBSD: src/sys/pci/if_fxpreg.h,v 1.13.2.3 1999/12/06 20:11:53 peter Exp $
> $FreeBSD: src/sys/pci/if_fxpvar.h,v 1.6.2.2 2000/04/01 19:04:22 dg Exp $
>
>The platform is a small PC designed for the point of sale folks, and uses
>the VIA Apollo MVP4 chipset.  From dmesg:
>
>  chip0: <Host to PCI bridge (vendor=1106 device=0501)> rev 0x02 on pci0.0.0
>  chip1: <PCI to PCI bridge (vendor=1106 device=8501)> rev 0x00 on pci0.1.0
>  chip2: <PCI to ISA bridge (vendor=1106 device=0686)> rev 0x14 on pci0.7.0
>  chip3: <Host to PCI bridge (vendor=1106 device=3057)> rev 0x10 on pci0.7.4
>
>We use an AMD K6-2 at 350 or 450 Mhz, 32MB of RAM and boot from Compact Flash.
>
>The two PCI slots are on a riser card.  On the riser card is a RealTEK
>8139 10/100 interface which works quite well:
>
>  rl0: <RealTek 8139 10/100BaseTX> rev 0x10 int a irq 12 on pci0.13.0
>
>We can install other RealTEK-based NICs in either or both riser card PCI
>slots, and they work well, as do WAN cards.  The problem comes when we
>install a NIC based on the Intel 82558 or 82559 parts.
>
>When the NIC is in the "top" slot on the riser (pci0.1.19), the kernel
>panics in if_fxp.c at fxp_add_rfabuf + 0xc4.  The backtrace says
>fxp_add_rfabuf was called from fxp_intr.

   That definately sounds like a hardware problem - an electrical problem,
perhaps noise related, on the PCI bus.

>With the NIC in the "bottom" slot (pci0.1.17), there is no panic, but the
>card gets choked up and seems not to listen reliably.  For example, it
>will hear an ARP reply if it sent the ARP request, but will ignore an
>ARP request inbound.  My sniffer shows the packets on the link, but there
>is no indication in a "netstat -i" that the NIC saw them.

   Could possibly be caused by a number of things. It could be another
manifestation of the problem above, it could be that the duplex isn't
being negotiated properly, or could be something altogether different.

>Further watching of a "netstat -i -w 1" display shows something very
>puzzling and troubling.  When the card _is_ working, the transmitted and
>received byte counts get updated in the display, but the associated
>packet counts don't go up for one or two seconds.  When the card is NOT
>working right (doesn't hear), the bytes-received counts will increment
>and the packets-received counts WON'T.

   The stats on the NIC are only read every second for the packet count, but
the byte count is updated as soon as the packet is sent or received. This can
cause a one second delay. I can't explain a two second delay other than the
DMA [to complete the stats transfer] is extremely slow for some reason.

>Here's the display for a "working" NIC on a quiet subnet that has a
>single machine sending broadcasts every 3 seconds and a quick 100-packet
>flood ping of that machine.  Note the two second delay before the packet
>counts catch up:
>
>            input         (fxp0)           output
>   packets  errs      bytes    packets  errs      bytes colls
>         1     0          0          0     0          0     0
>         0     0         71          0     0          0     0
>         0     0          0          0     0          0     0
>         1     0       9800          0     0       9800     0
>         0     0         71          0     0          0     0
>       100     0          0        100     0          0     0
>         1     0          0          0     0          0     0
>         0     0         71          0     0          0     0
>         0     0          0          0     0          0     0
>         1     0          0          0     0          0     0
>
>Our mbuf levels are hitting really high peaks, and I suspect that
>whatever is hanging onto the packets is responsible for that.  Other
>NICs in the same situation (including the much maligned RealTEK) don't
>exhibit this symptom, and don't run up our peak mbufs.
>
>In addition to causing massive peaks, the Intel NICs do something else
>ugly.  It appears that they get choked up when they can't get rid of
>queued outputs as quickly as they would like.  A 10Mbps shared-media
>segment will have many many collisions when transfering a file or doing
>a flood ping between two fast FreeBSD boxes, and a bunch of the queued
>output mbufs wind up in limbo.  Changing to a full-duplex 100Mbps
>connection between the boxes eliminates the buffer-loss problem, but
>does not stop the NIC from having its receive or panic problems.  We
>see the mbuf peak symptoms on other motherboards as well, but not the
>ignored received packets.

   Peak mbuf levels really aren't relevant to the problem. The fxp driver
holds onto mbufs as part of an optimization to reduce interrupt overhead.
It shouldn't peak higher than about 250 or so, however, if things are
working correctly.
   It sounds to me as though there is a serious problem with the DMA
operating properly. I'm wondering if the Apollo chipset doesn't support
some PCI operation that the Pro/100 wants, causing major problems.
Unfortunately I don't believe that there isn't anything that can be done
in the driver to work around this. What you need is someone with a PCI
bus analyzer to look into the behavior on the bus more closely. You may
wish to look for any BIOS settings that might affect the DMA - things
like write buffering, burst size, etc., and tweak with those to see if
you can affect the behavior.

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
Manufacturer of high-performance Internet servers - http://www.terasolutions.com
Pave the road of life with opportunities.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200006141737.KAA06177>