Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 May 2008 20:45:43 +0200
From:      Gerrit =?ISO-8859-1?Q?K=FChn?= <gerrit@pmp.uni-hannover.de>
To:        freebsd-stable@FreeBSD.ORG
Cc:        Oliver Fromme <olli@lurza.secnetix.de>
Subject:   Re: broken re(4)
Message-ID:  <20080529204543.d4aa927e.gerrit@pmp.uni-hannover.de>
In-Reply-To: <200805291652.m4TGqt2o060679@lurza.secnetix.de>
References:  <20080529171351.a3dd5111.gerrit@pmp.uni-hannover.de> <200805291652.m4TGqt2o060679@lurza.secnetix.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 29 May 2008 18:52:55 +0200 (CEST) Oliver Fromme
<olli@lurza.secnetix.de> wrote about Re: broken re(4):

OF> In that case I would suspect that the one piece of hardware
OF> that is misbehaving is broken and needs to be replaced.

I agree. I just do not know yet which part is broken.

OF>  > The only hardware thing that is different in this system from the
OF>  > others is an additional SATA-controller. Can there be conflicts with
OF>  > this card which are triggering the problems?

OF> I think it's unlikely.  Do they share interrupts?  (The
OF> output of "vmstat -i" will tell you.)

protoserve# vmstat -i
interrupt                          total       rate
irq0: clk                       31564049       1000
irq7: ppbus0 ppc0                      1          0
irq8: rtc                        4038754        127
irq9: uhci0 uhci1+                     2          0
irq10: re0 re1+                  2401340         76
irq11: atapci0+++                 655498         20
irq14: ata0                        11167          0
Total                           38670811       1225


Just the two NICs on the same IRQ. A system that is working fine looks
like this:

firefly1# vmstat -i
interrupt                          total       rate
irq0: clk                     2614761182       1000
irq1: atkbd0                         902          0
irq7: ppbus0 ppc0                      1          0
irq8: rtc                      334559120        127
irq10: re0 re1+                 24354774          9
irq11: atapci0++++                 70905          0
irq14: ata0                       800110          0
Total                         2974546994       1138

OF> In theory it could also be a power supply problem.  I
OF> assume that you use rather small (thus possibly weak)
OF> power supplies for your ITX machines.  Maybe the SATA
OF> controller in that problematic machine drives the power
OF> supply to its limit, and the re(4) interfaces suffer.
OF> You could check whether removing the SATA controller
OF> improves things.  Or try to connect a stronger power
OF> supply if you have one available.

I have Travla C146/C147 chassis these macines and use the power supply
that comes with them.
However, the ultimate test for checking the controller-related things is
to simply remove it. I will try this tomorrow (the systems are at work,
and I am at home now - can't unplug a controller via ssh :-).

OF>  - Do you see any non-zero numbers in the collision or
OF>    error columns of "netstat -i"?

No:

protoserve# netstat -i
Name    Mtu Network       Address              Ipkts Ierrs    Opkts Oerrs  Coll
re0    1500 <Link#1>      00:30:18:af:19:6a   131032     0   271757     0     0
re0    1500 10.117.0.0    protoserve           80442     -   271722     -     -
re1    1500 <Link#2>      00:30:18:af:19:6b  1474484     0  1114542     0     0
re1    1500 192.168.0.0   192.168.2.1        1471156     -  1114457     -     -
plip0  1500 <Link#3>                               0     0        0     0     0
lo0   16384 <Link#4>                               0     0        0     0     0
lo0   16384 fe80:4::1     fe80:4::1                0     -        0     -     -
lo0   16384 localhost     ::1                      0     -        0     -     -
lo0   16384 your-net      localhost                0     -        0     -     -

OF>  - Are you sure the interfaces don't have the same MAC
OF>    addresses (it's unlikely, but it doesn't hurt to check
OF>    in the ifconfig output).

Yes:

protoserve# ifconfig
re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
        ether 00:30:18:af:19:6a
        inet 10.117.15.1 netmask 0xffff0000 broadcast 10.117.255.255
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
re1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
        ether 00:30:18:af:19:6b
        inet 192.168.2.1 netmask 0xffff0000 broadcast 192.168.255.255
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
plip0: flags=108810<POINTOPOINT,SIMPLEX,MULTICAST,NEEDSGIANT> metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
        inet6 ::1 prefixlen 128 
        inet 127.0.0.1 netmask 0xff000000 

OF>  - Are you sure that media and duplex settings are
OF>    correct on both sides (i.e. PC and switch)?

The systems are all on the same switch (I also changed the switch during the tests with no change), all devices show a 1GB link.

OF>  - Have you tried replacing cables, switch ports, or the
OF>    whole switch?

Yes, all of that.

OF>  - Have you tried to disable hardware support features
OF>    of the driver?  In 7-stable re(4) supports quite a lot
OF>    of hardware features.  See "ifconfig -m".  You could
OF>    check whether disabling RXCSUM, TXCSUM and/or TSO4
OF>    makes a difference.

Another good idea, thanks. I will try that tomorrow, too.


cu
  Gerrit



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080529204543.d4aa927e.gerrit>