Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Aug 2009 05:52:23 -0700 (PDT)
From:      alexpalias-bsdnet@yahoo.com
To:        =?utf-8?B?0JTQvNC40YLRgNC40Lkg0JfQsNC80YPRgNCw0LXQsg==?= <gigabyte.tmn@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   RE: em driver input errors
Message-ID:  <24727.68667.qm@web56404.mail.re3.yahoo.com>
In-Reply-To: <001401ca1f4d$e96a2170$1e010a0a@in72.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Greetings.=0A=0A--- On Mon, 8/17/09, =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=
=D0=B9 =D0=97=D0=B0=D0=BC=D1=83=D1=80=D0=B0=D0=B5=D0=B2 <gigabyte.tmn@gmail=
.com> wrote:=0A=0A> From: =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D0=B9 =D0=97=
=D0=B0=D0=BC=D1=83=D1=80=D0=B0=D0=B5=D0=B2 <gigabyte.tmn@gmail.com>=0A> Sub=
ject: RE: em driver input errors=0A> To: alexpalias-bsdnet@yahoo.com=0A> Cc=
: freebsd-net@freebsd.org=0A> Date: Monday, August 17, 2009, 6:17 PM=0A> =
=0A>  =0A> >/boot/loader.conf:=0A> >hw.em.rxd=3D4096=0A> >hw.em.txd=3D4096=
=0A> why you are using this=0A> values? try default (without =0A> this line=
s in loader.conf)=0A=0AAs said in my original email, I was getting way more=
 errors with the defaults.=0A =C2=A0=0A> > Witout the above we=0A> were see=
ing way more =0A> errors, now they are reduced, but still come in bursts of=
=0A> over 1000 errors on =0A> em0.=0A> >Still seeing errros,=0A> after some=
 searching the =0A> mailing lists we also added:=0A> ># the four lines belo=
w=0A> are repeated for em1, =0A> em2, =0A> em3=0A> >dev.em.0.rx_int_delay=
=3D0=0A> >dev.em.0.rx_abs_int_delay=3D0=0A> >dev.em.0.tx_int_delay=3D0=0A> =
>dev.em.0.tx_abs_int_delay=3D0=0A> try to increase=0A> rx_int_delay to 600 =
and =0A> rx_abs_int_delay to 1000, tx_*_delay without changes ->=0A> by def=
ault =0A> (100?)=0A=0AThanks for the suggestion.=0AFrom a "clean" box:=0Ade=
v.em.0.rx_int_delay: 0=0Adev.em.0.tx_int_delay: 66=0Adev.em.0.rx_abs_int_de=
lay: 66=0Adev.em.0.tx_abs_int_delay: 66=0A=0AI reset all the values (errors=
 still appearing), then tried your suggestion (rx_int_delay=3D600, rx_abs_i=
nt_delay=3D1000).  This has reduced the number of interrupts for em0 (from =
about 7200/sec to around 6500/sec).  After some time, I started getting err=
ors again.  But that has made me try this also:=0A=0Adev.em.0.tx_int_delay=
=3D600=0Adev.em.0.tx_abs_int_delay=3D1000=0A=0AMeaning using your suggested=
 values for tx too.  Now em0 is seeing about 1800 interrupts/second, which =
is way better, but after some time I saw errors again...=0A=0AFrom the outp=
ut of "netstat -nI em0 -w 5":=0A=0A            input          (em0)        =
   output=0A   packets  errs      bytes    packets  errs      bytes colls=
=0A     87267     0   50372599     106931     0   81598993     0=0A     864=
96     0   50990332     105467     0   80064657     0=0A     81726  3056   =
49876613      99080     0   73273640     0=0A     90425     0   59172531   =
  105299     0   77110096     0=0A    120292     0   70369292     109597   =
  0   78626248     0=0A... a few minutes pass with zero errors ...=0A     8=
9646     0   56951878     111240     0   86493393     0=0A     86031     0 =
  53549721     108695     0   83592747     0=0A     77760  3054   48505562 =
     96912     0   73185576     0=0A     87508     0   56116394     106094 =
    0   79130608     0=0A     89031     0   56490982     103039     0   773=
98567     0=0A=0AWhat's interesting is that I'm seeing errors in a 80k pack=
ets/5 sec (so around 16k packets/s) zone, but no errors at 120k packets/5se=
c (24kpps).=0A=0A=0ACurrently, I've set the delay to 600 and abs_delay to 1=
000 on all interfaces (em0, em1, em2, em3), thus reducing the number of int=
errupts.=0AI'm currently seeing (in systat -vmstat 2):=0AAround 1800 irqs/s=
 for em0, 1800 for em1, 1800 for em2, under 10/s for em3=0AAround 2000 irqs=
/s for cpu0:time, 2000 more for cpu1:time, 2000 for cpu2:time and 2000 for =
cpu3:time.=0A=0AInterrupts total (as reported by systat):  around 13500/sec=
ond.  I would estimate the old IRQ load at around 30000-35000/second, which=
 doesn't seem too much to me, for a dual xeon machine.=0A=C2=A0=0A> >kern.i=
pc.nmbclusters=3D655360=0A> no need. see netstat=0A> -m=0A=0AThanks, but as=
 I said, I did try almost *EVERYTHING* I could without rebooting.  Includin=
g this.=0A=0ASpeaking of which, I did compile the kernel with "options DEVI=
CE_POLLING", but enabling polling only made the errors appear more often, a=
nd in greater numbers.=0A=0A> P.S. change copper cable,=0A> turn off the fl=
ow-control =0A> (if is on) =0A=0AThere are 4 em interfaces on this machine,=
 with new cat6 cables.  2 more em interfaces on another machine that was se=
eing the same errors (the old router), on different cables.  And 2 more em =
interfaces on another machine that's in production, also with new cables.  =
The input errors (as debugged by sysctl dev.em.0.stats=3D1 -> read dmesg) a=
re only 2 because of CRC errors, as opposed to around 2.500.000 from other =
causes.  I tend to feel the cable isn't the problem.=0A=0AFlow control is o=
ff, I just checked.  I forgot about that one, thanks for reminding me.=0A=
=0A=0AThank you for your help=0AAlex



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?24727.68667.qm>