Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Aug 2009 09:54:40 -0700 (PDT)
From:      Barney Cordoba <barney_cordoba@yahoo.com>
To:        freebsd-net@freebsd.org, alexpalias-bsdnet@yahoo.com
Subject:   Re: em driver input errors
Message-ID:  <210006.36085.qm@web63904.mail.re1.yahoo.com>
In-Reply-To: <11420.28890.qm@web56404.mail.re3.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help

=0A=0A--- On Sat, 8/1/09, alexpalias-bsdnet@yahoo.com <alexpalias-bsdnet@ya=
hoo.com> wrote:=0A=0A> From: alexpalias-bsdnet@yahoo.com <alexpalias-bsdnet=
@yahoo.com>=0A> Subject: em driver input errors=0A> To: freebsd-net@freebsd=
.org=0A> Date: Saturday, August 1, 2009, 9:05 AM=0A> Good day=0A> =0A> I'm =
running a FreeBSD 7.2 router and I am seeing a lot of=0A> input errors on o=
ne of the em interfaces (em0), coupled with=0A> (at approximately the same =
times) much fewer errors on em1=0A> and em2.=A0 Monitoring is done with SNM=
P from another=0A> machine, and the CPU load as reported via SNMP is mostly=
=0A> below 30%, with a couple of spikes up to 35%.=0A> =0A> Software descri=
ption:=0A> =0A> - FreeBSD 7.2-RELEASE-p2, amd64=0A> - bsnmpd with modules: =
hostres and (from ports) snmp_ucd=0A> - quagga 0.99.12 (running only zebra =
and bgpd)=0A> - netgraph (ng_ether and ng_netflow)=0A> =0A> Hardware descri=
ption:=0A> =0A> - Dell machine, dual Xeon 3.20 GHz, 4 GB RAM=0A> - 2 x buil=
t-in gigabit interfaces (em0, em1)=0A> - 1 x dual-port gigabit interface, P=
CI-X (em2, em3) [see=0A> pciconf near the end]=0A> =0A> =0A> The machine re=
ceives the global routing table ("netstat -nr=0A> | wc -l" gives 289115 cur=
rently).=0A> =0A> All of the em interfaces are just configured "up", with=
=0A> various vlan interfaces on them.=A0 Note that I use "kpps" to=0A> mean=
 "thousands of packets per second", sorry if that's the=0A> wrong shorthand=
.=0A> =0A> - em0 sees a traffic of 10...22 kpps in, and 15...35 kpps=0A> ou=
t.=A0 In bits, it's 30...120Mbits/s in, and=0A> 100...210Mbits/s out.=A0 Vl=
ans configured are vlan100 and=0A> vlan200, and most of the traffic is on v=
lan100 (vlan200 sees=0A> 4kpps in / 0.5kpps out maximum, with the average a=
t about=0A> one third of this).=A0 em0 is the external interface, and its=
=0A> traffic corresponds to the sum of traffic through em1 and=0A> em2=0A> =
=0A> - em1 has 5 vlans, and sees about 22kpps in / 11kpps out=0A> (maximum)=
=0A> =0A> - em2 has a single VLAN, and sees about 4...13kpps both in=0A> an=
d out (almost equal in/out during most of the day)=0A> =0A> - em3 is a back=
up interface, with 2 VLANS, and is the only=0A> one which has seen no error=
s.=0A> =0A> Only the vlans on em0 are analyzed by ng_netflow, and the=0A> e=
rrors I'm seeing have started appearing days before=0A> netgraph was even l=
oaded in the kernel.=0A> =0A> Tuning done:=0A> =0A> /boot/loader.conf:=0A> =
hw.em.rxd=3D4096=0A> hw.em.txd=3D4096=0A> =0A> Witout the above we were see=
ing way more errors, now they=0A> are reduced, but still come in bursts of =
over 1000 errors on=0A> em0.=0A> =0A> /etc/sysctl.conf:=0A> net.inet.ip.fas=
tforwarding=3D1=0A> dev.em.0.rx_processing_limit=3D300=0A> dev.em.1.rx_proc=
essing_limit=3D300=0A> dev.em.2.rx_processing_limit=3D300=0A> dev.em.3.rx_p=
rocessing_limit=3D300=0A> =0A> Still seeing errros, after some searching th=
e mailing lists=0A> we also added:=0A> =0A> # the four lines below are repe=
ated for em1, em2, em3=0A> dev.em.0.rx_int_delay=3D0=0A> dev.em.0.rx_abs_in=
t_delay=3D0=0A> dev.em.0.tx_int_delay=3D0=0A> dev.em.0.tx_abs_int_delay=3D0=
=0A> =0A> Still getting errors, so I also added:=0A> =0A> net.inet.ip.intr_=
queue_maxlen=3D4096=0A> net.route.netisr_maxqlen=3D1024=0A> =0A> and=0A> =
=0A> kern.ipc.nmbclusters=3D655360=0A> =0A> =0A> Also tried with rx_process=
ing_limit set to -1 on all em=0A> interfaces, still getting errors.=0A> =0A=
> Looking at the shape of the error and packet graphs, there=0A> seems to b=
e a correlation between the number of packets per=0A> second on em0 and the=
 height of the error "spikes" on the=0A> error graph.=A0 These spikes are s=
pread throughout the day,=0A> with spaces (zones with no errors) of various=
 lengths (10=0A> minutes ... 2 hours spaces within the last 24 hours), but=
=0A> sometimes there are errors even in the lowest kpps times of=0A> the da=
y.=0A> =0A> em0 and em1 error times are correlated, with all errors on=0A> =
the graph for em0 having a smaller corresponding error spike=0A> on em1 at =
the same time, and sometimes an error spike on=0A> em2.=0A> =0A> The old ro=
uter was seeing about the same traffic, and had=0A> em0, em1, re0 and re1 n=
etwork cards, and was only seeing=0A> errors on the em cards.=A0 It was run=
ning=0A> 7.2-PRERELEASE/i386=0A> =0A> =0A> Any suggestions would be greatly=
 appreciated.=A0 Please note=0A> that this is a live router, and I can't re=
boot it (unless=0A> absolutely necessary).=A0 Tuning that can be applied wi=
thout=0A> rebooting will be tried first.=0A> =0A> Here are some more detail=
s:=0A> =0A> Trimmed output of netstat -ni (sorry if there are line=0A> brea=
ks):=0A> Name=A0 =A0 Mtu Network=A0 =A0 =A0=A0=A0Address=A0 =A0 =A0 =A0 =A0=
 =A0=0A> =A0 Ipkts Ierrs=A0 =A0 Opkts Oerrs=A0 Coll=0A> em0=A0 =A0 1500 <Li=
nk#1>=A0 =A0 =A0 00:14:22:xx:xx:xx=0A> 19744458839 15494721 24284439443=A0 =
=A0=A0=A00=A0 =A0=A0=A00=0A> em1=A0 =A0 1500 <Link#2>=A0 =A0 =A0 00:14:22:x=
x:xx:xx=0A> 12832245469 123181 10105031790=A0 =A0=A0=A00=A0 =A0=A0=A00=0A> =
em2=A0 =A0 1500 <Link#3>=A0 =A0 =A0 00:04:23:xx:xx:xx=0A> 12082552403 10964=
 10339416865=A0 =A0=A0=A00=A0 =A0=A0=A00=0A> em3=A0 =A0 1500 <Link#4>=A0 =
=A0 =A0 00:04:23:xx:xx:xx=0A> 79912337=A0 =A0=A0=A00 48178737=A0 =A0=A0=A00=
=A0 =A0=A0=A00=0A> =0A> Relevant part of pciconf -vl:=0A> =0A> em0@pci0:6:7=
:0: class=3D0x020000 card=3D0x016d1028=0A> chip=3D0x10768086 rev=3D0x05 hdr=
=3D0x00=0A> =A0 =A0 vendor=A0 =A0=A0=A0=3D 'Intel Corporation'=0A> =A0 =A0 =
device=A0 =A0=A0=A0=3D '82541EI Gigabit Ethernet=0A> Controller'=0A> =A0 =
=A0 class=A0 =A0 =A0 =3D network=0A> =A0 =A0 subclass=A0=A0=A0=3D ethernet=
=0A> em1@pci0:7:8:0: class=3D0x020000 card=3D0x016d1028=0A> chip=3D0x107680=
86 rev=3D0x05 hdr=3D0x00=0A> =A0 =A0 vendor=A0 =A0=A0=A0=3D 'Intel Corporat=
ion'=0A> =A0 =A0 device=A0 =A0=A0=A0=3D '82541EI Gigabit Ethernet=0A> Contr=
oller'=0A> =A0 =A0 class=A0 =A0 =A0 =3D network=0A> =A0 =A0 subclass=A0=A0=
=A0=3D ethernet=0A> em2@pci0:9:4:0: class=3D0x020000 card=3D0x10128086=0A> =
chip=3D0x10108086 rev=3D0x01 hdr=3D0x00=0A> =A0 =A0 vendor=A0 =A0=A0=A0=3D =
'Intel Corporation'=0A> =A0 =A0 device=A0 =A0=A0=A0=3D '82546EB Dual Port G=
igabit Ethernet=0A> Controller (Copper)'=0A> =A0 =A0 class=A0 =A0 =A0 =3D n=
etwork=0A> =A0 =A0 subclass=A0=A0=A0=3D ethernet=0A> em3@pci0:9:4:1: class=
=3D0x020000 card=3D0x10128086=0A> chip=3D0x10108086 rev=3D0x01 hdr=3D0x00=
=0A> =A0 =A0 vendor=A0 =A0=A0=A0=3D 'Intel Corporation'=0A> =A0 =A0 device=
=A0 =A0=A0=A0=3D '82546EB Dual Port Gigabit Ethernet=0A> Controller (Copper=
)'=0A> =A0 =A0 class=A0 =A0 =A0 =3D network=0A> =A0 =A0 subclass=A0=A0=A0=
=3D ethernet=0A> =0A> Kernel messages after sysctl dev.em.0.stats=3D1:=0A> =
(note that I've removed the lines which only showed zeros=0A> in the second=
 and third outputs)=0A> =0A> em0: Excessive collisions =3D 0=0A> em0: Seque=
nce errors =3D 0=0A> em0: Defer count =3D 0=0A> em0: Missed Packets =3D 154=
35312=0A> em0: Receive No Buffers =3D 16446113=0A> em0: Receive Length Erro=
rs =3D 0=0A> em0: Receive errors =3D 1=0A> em0: Crc errors =3D 2=0A> em0: A=
lignment errors =3D 0=0A> em0: Collision/Carrier extension errors =3D 0=0A>=
 em0: RX overruns =3D 96826=0A> em0: watchdog timeouts =3D 0=0A> em0: RX MS=
IX IRQ =3D 0 TX MSIX IRQ =3D 0 LINK MSIX IRQ =3D 0=0A> em0: XON Rcvd =3D 0=
=0A> em0: XON Xmtd =3D 0=0A> em0: XOFF Rcvd =3D 0=0A> em0: XOFF Xmtd =3D 0=
=0A> em0: Good Packets Rcvd =3D 19002068797=0A> em0: Good Packets Xmtd =3D =
23168462599=0A> em0: TSO Contexts Xmtd =3D 0=0A> em0: TSO Contexts Failed =
=3D 0=0A> =0A> [later]=0A> em0: Excessive collisions =3D 0=0A> em0: Missed =
Packets =3D 15459111=0A> em0: Receive No Buffers =3D 16447082=0A> em0: Rece=
ive errors =3D 1=0A> em0: Crc errors =3D 2=0A> em0: RX overruns =3D 96835=
=0A> em0: Good Packets Rcvd =3D 19165047284=0A> em0: Good Packets Xmtd =3D =
23386976960=0A> =0A> [later]=0A> em0: Excessive collisions =3D 0=0A> em0: M=
issed Packets =3D 15470583=0A> em0: Receive No Buffers =3D 16447686=0A> em0=
: Receive errors =3D 1=0A> em0: Crc errors =3D 2=0A> em0: RX overruns =3D 9=
6840=0A> em0: Good Packets Rcvd =3D 19255466068=0A> em0: Good Packets Xmtd =
=3D 23519004546=0A> =0A=0A=0ANote that "most" pcix motherboards wire onboar=
d NICs to 32bits and 33Mhz, mainly because its apparently easier to do so. =
Its likely that your =0Aadd-on card is running at 64bits and 133Mhz. =0A=0A=
32bits/33Mhz isn't really fast enough to manage gigabit traffic flows, as=
=0Aits max burst is only 1 Gb/s, so you really can't use them for any sort=
=0Aof primary traffic flow. Check with you MB manufacturer as they usually=
=0Adon't advertise it. =0A=0ABarney=0A=0A=0A      



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?210006.36085.qm>